A Quality, Size and Time Assessment of the Binarization of Documents Photographed by Smartphones

Smartphones with an in-built camera are omnipresent today in the life of over eighty percent of the world’s population. They are very often used to photograph documents. Document binarization is a key process in many document processing platforms. This paper assesses the quality, file size and time performance of sixty-eight binarization algorithms using five different versions of the input images. The evaluation dataset is composed of deskjet, laser and offset printed documents, photographed using six widely-used mobile devices with the strobe flash off and on, under two different angles and four shots with small variations in the position. Besides that, this paper also pinpoints the algorithms per device that may provide the best visual quality-time, document transcription accuracy-time, and size-time trade-offs. Furthermore, an indication is also given on the “overall winner” that would be the algorithm of choice if one has to use one algorithm for a smartphone-embedded application.


Introduction
The current number of smartphone users in the world today is over 6.6 billion (Source: https://www.bankmycell.com/blog/how-many-phones-are-in-the-world, last visited on 29 December 2022), which means that over 83% of the world's population owns a smartphone. The omnipresence of smartphones with in-built cameras made most people (91%) take photos with smartphones, while only 7% use digital cameras or tablets (2%). According to that same website, the forecast figures by Ericsson and the Radicati Group, that percentage is expected to grow from 91% in 2022 to 94% in 2026. Consumers see the quality of the camera as a key factor in choosing a smartphone model. Thus, since cameras became the most significant selling point on smartphones, manufacturers have been putting much effort into improving their quality. At first, they paid more attention to the amount of megapixels a smartphone camera could pack. In the last few years, smartphone manufacturers have opted to add more cameras to their phones to improve photo quality and optical zoom functionality while keeping the device thin. Each camera has a lens that can yield either a wide shot or a zoomed-in shot. Some phones have additional black and white cameras for increased light sensitivity, while others offer depth information. Data from the different cameras can be combined into a clear photo with seemingly shallow depth-of-field and good low-light capability.
Taking photos of documents with smartphone cameras, an attitude that started almost two decades ago [1][2][3][4], became of widespread use today. It is extremely simple and saves photocopying costs, allowing the document image to be easily stored and shared using computer networks. However, smartphone cameras were made to take family and landscape photos or make videos of such subjects and were not targeted at document image acquisition. Smartphone document images have several problems that bring challenges to processing them. The resolution and illumination are uneven, there are perspective distortions, and often the interference of external light sources [4]. Even the in-built strobe flash may add further difficulties if activated by the user or automatically. Besides all that, the standard file format used by smartphone cameras to save the images is jpeg, which inserts the jpeg noise [5], a light white noise added to prevent two pixels of the same color from appearing next to each other. This noise makes the final image more pleasant to the human eye glancing at a landscape or family photo, but it also means a loss in sharpness in a document image, bringing difficulties to any further processing.
The conversion of a color image into its black-and-white version is called thresholding or binarization. It is a key step in the pipeline of many document processing systems, including document content recovery [6]. The binarization of scanned document images is far from being a simple task as the physical noises [7], such as paper aging, stains, fungi, folding marks, etc., and back-to-front interference [8] increase the complexity of the task. In the case of scanned documents, some recent document binarization competitions [9,10] show that no single binarization algorithm is efficient for all types of text document images. Their performance depends on a wide number of factors, from the digitalization device, image resolution, the kind of physical noises in the document [7], the way the document was printed, typed or handwritten, the age of the document, etc. Besides that, those competitions showed that the time complexity of the algorithms also varies widely, making some of them impossible to be used in any document processing pipeline. Thus, instead of having an overall best, those competitions pointed out the top quality-time algorithms in several categories of documents.
The binarization of photographed documents is far more complex than scanned ones and, as already mentioned above, the resolution and illumination are uneven, among several other problems. Besides that, each smartphone model has different camera features. The first competition to assess the quality and time of the binarization of smartphone camera-acquired text documents, the type of document that is most often photographed, comparing new algorithms with previously published and more classical ones was [11]. In 2021, that same competition occurred with several new competitors and devices [12].
Binary images also are much smaller than their color counterparts, thus their use may save storage space and computer bandwidth [13]. This means that assessing the resulting image file size using a lossless compression scheme is also relevant for comparison among binarization algorithms. Besides that, the binary image may be the key for generating colored synthetic images, which are visually indistinguishable from the original document whenever printed or visualized on a screen [14]. Run-length encoding [15] the sequences of black and white pixels is the key to several schemes for compressing monochromatic images. Suppose the binarization process leaves salt-end-pepper noise in the final image, sometimes imperceptible to the human eye. In that case, that noise will break the sequence of similar pixels, degrading the performance of the image compression scheme. Indirectly, that can also be observed as a measure of the quality of the monochromatic image. The third venue [16] of the ACM DocEng Competition on the binarization of photographed documents assessed five new and sixty-four algorithms, and it was possibly the first time the size of the monochromatic image was considered in the assessment of the binarization algorithms, ever.
Reference [17] shows that feeding the binarization algorithms with the different red, green and blue (RGB) channels, instead of the whole image, may yield a better quality two-tone image, besides saving processing time. This paper largely widens the scope of [16] as, due to the restricted time to produce the final report, it was impossible to process and assess the quality, time, and file size of the almost 350 binarization schemes. Besides that, also due to processing time limitations, the file-size assessment was ranked based on the quality of the optical character recognition (OCR) transcription based on the Levenshtein distance to the ground-truth text. In contrast, here, one ranks the algorithms based on a new image quality measure introduced here, possibly a more adequate measure.
The recent paper [18] presents a methodology to pinpoint which binarization algorithm would provide the best quality-time trade-off either for printing or for OCR-transcription. It also proposes an overall winner if one would choose one single algorithm capable of being embedded in applications in a smartphone model. The present paper also makes such choices for each of the smartphones assessed.

Materials and Methods
Six different models of smartphones from three different manufacturers, widely used today, were used in this assessment. Their camera specification is described on Table 1. Their in-built strobe flash was set on and off to acquire images of offset, laser, and deskjet printed text documents photographed at four shots with small variations in the position and moments, to allow for different interfering light sources. The document images captured with the six devices were grouped into two separate datasets: • . It also has challenging images, but they are less complex than Dataset 1. The test images were incorporated to the IAPR (International Association for Pattern Recognition) DIB -Document image binarization platform (https://dib.cin.ufpe.br, accessed on 17 January 2023)), which focuses on document binarization. It encompasses several datasets of document images of historical, bureaucratic, and ordinary documents, Which were handwritten, machine-typed, offset, laser, and ink-jet printed, both scanned and photographed, several of them with their corresponding ground-truth images. Besides being a document repository, the DIB-platform encompasses a synthetic document image generator, which allows the user to create over 5.5 million documents with different features. As already mentioned, reference [17] shows that binarization algorithms, in general, yield different quality images whenever fed with the color, gray-scale-converted, and R, G, and B-channels. Here, 68 classical and recently published binarization algorithms are fed with the five versions of the input image, totaling 340 different binarization schemes. The complete list of the algorithms used is presented in Table 2, along with a short description and the approach followed in each of them.

Method
Year Category Description Percentile [20] 1962 Global threshold Based on partial sums of the histogram levels Triangle [21] 1977 Global threshold Based on most and least frequent gray level Otsu [22] 1979 Global threshold Maximize between-cluster variance of pixel intensity IsoData [23] 1980 Global threshold IsoData clulstering algorithm applied to image histogram Pun [24] 1981 Global threshold Defines an anisotropy coefficient related to the asymmetry of the histogram Johannsen-Bille [25] 1982 Global threshold Minimizes formula based on the image entropy Kapur-SW [26] 1985 Global threshold Maximizes formula based on the image entropy Moments [27] 1985 Global threshold Aims to preserve the moment of the input picture Niblack [28] 1985 Local threshold Based on window mean and the standard deviation Bernsen [29] 1986 Local threshold Uses local image contrast to choose threshold MinError [30] 1986 Global threshold Minimum error threshold Mean [31] 1993 Global threshold Mean of the grayscale levels Shanbhag [32] 1994 Global threshold Improves Kapur-SW by viewing the two pixel classes as fuzzy sets Huang [33] 1995 Global threshold Minimizes the measures of fuzzines Yen [34] 1995 Global threshold Multilevel threshold based on maximum correlation criterion RenyEntropy [35] 1997 Global threshold Uses Renyi's entropy similarly as Kapur-SW method Sauvola [36] 1997 Local threshold Improvement on Niblack Li-Tam [37] 1998 Global threshold Minimum cross entropy Wu-Lu [38] 1998 Global threshold Minimizes the difference between the entropy of the object and the background Mello-Lins [13] 2000 Global threshold Uses Shannon Entropy to determine the global threshold. Possibly the first to properly handle back-to-front interference Wolf [39] 2002 Local threshold Improvement on Sauvola with global normalization  The quality of the final monochromatic image is the most important assessment criterion. Once one has the top-quality images, one may consider the mean size of the monochromatic files and the mean time elapsed by each of the assessed algorithms through the dataset. This paper proposes a novel quality measure for photographed document images called PL, it is a combination of the previously proposed P err [70] and [L dist ] [11] measures. Two quality measures were used to evaluate the quality of the binarization algorithms: the [L dist ] and the PL.

The Quality Measure of the Proportion of Pixels (P err )
Assessing image quality of any kind is a challenging task. The quality of photographed documents is particularly hard to evaluate as the image resolution is uneven, it strongly depends on the features of the device, the distance between the document and the camera and it even suffers from perspective distortion. Creating a ground-truth (GT) binary image for each photographed document would require a non-viable paramount effort. An alternative method [70] was used: the paper sheet or book page is scanned at 300 dpi, binarized with several algorithms, visually inspected, and manually selected and retouched to provide the best possible binary image of that scanned document, which will generate the reference proportion of black pixels for that document image. The P err measure compares the proportion between the black-to-white pixels in the scanned and photographed binary documents, as described in Equation (1): where PB = 100 × (B/N) is the proportion of black pixels in the image, B is the total number of black pixels and N is the total number of pixels in the image. Thus, PB bin is the proportion of black pixels in the binary image and PB GT is the proportion of black pixels in the scanned ground-truth image.
In order to provide a fair assessment, the photographed image must meet several requirements. The resolution of the output document photo must be close to 300 dpi (which correspond to the scanned one). To meet such a requirement, the camera should have around 12 Mpixel resolution and the document should fill nearly all the photographed image; the photo must be cropped to remove any reminding border. Here, the cropping is manually done, as the focus is to assess specifically the binarization algorithms. Figure 1 describes the preparation of the images and an example of P err calculation. The P err was used by the last DocEng contests [11,12,16] to evaluate the quality of the binary images for printing and human reading.

Normalized Levenshtein Distance ([L dist ])
The second quality measure is the Optical Character Recognition (OCR) correctness rate measured by [L dist ] [11], which is the Levenshtein [71] distance normalized by the number of characters in the text. Google Vision OCR was used to obtain the machinetranscribed text. It is important to note that Google Vision automatically detects the input language and applies post-processing based on dictionary, which cannot be deactivated. The Levenshtein distance, here denoted by L dist , expresses the number of character insertion, deletion and replacements that would be necessary to convert the recognized text into the manually transcribed reference text for each image. Thus, the L dist depends on the length of the text and cannot be used to measure the performance across different documents as an absolute value. In [11], a normalized version of the L dist was proposed, calculated as: where #char is the number of characters in the reference text. The DocEng 2022 binarization competition for photographed documents presented a new challenging dataset in which complex shaded areas were introduced. Although the P err quality measure worked well whenever the shaded area was more uniformly distributed, in those more complex multi-shaded documents, some algorithms may concentrate the pixels around some characters (e.g. by dilatation) while completely removing other parts of the document. This could generate an image that has the same proportion of black pixels as the ground-truth, a clear background with no evident noise, but its text is unreadable. Taking, for instance, an example image taken with Apple iPhone SE2 of a deskjet printed document with the strobe flash off (Figure 2a), the algorithm with the closest black pixel proportion would be DiegoPavan provided the original color image. The result is presented in Figure 2b. Note that even the remaining dilated letters are nearly unreadable, giving a [L dist ] of nearly zero, meaning almost no text was transcribed. The P err close to zero means the proportion of black pixels is very close to the ground-truth.  If one ignores the P err and only sorts the results by [L dist ], the most recommended algorithm would be dSLR, having the original color image as input. The result of such binarization is presented in Figure 2c for the same image. Nearly all the text was successfully transcribed ([L dist ] close to 1.0), however, there is a large noisy area in the bottom-left corner, which only did not significantly affected the transcription due to the large margins of the document. Such a noise was generated by a shadow of the mobile phone and could not be detected by [L dist ] measure, but checking P err it is clear that a large amount of noise is present. A printed document usually has nearly 5% of text pixels (in this image, it was 3.77%), thus a difference of 8.79 from the ground-truth is a large one. If one would want just to transcribe the text, it could be enough to use such an algorithm for that image; however, if the margins were smaller or the binarized document would be printed, such a large noise blurb would be unacceptable.

Pixel Proportion and Levenshtein Measure (PL)
In order to obtain the best OCR quality while providing visually pleasant humanreadable binary document images, a new quality measure is proposed here: Applying such a new measure to the already presented examples of document images would yield PL = 5.69 for DiegoPavan-C and PL = 84.82 for dSLR-C, while the best algorithm, according to the proposed quality measure, Yasin-R, yields PL = 90. 22. The corresponding image is presented in Figure 2d, and it has a better overall visual quality and OCR transcription rate, although the dSLR algorithm is an order of magnitude faster than the other two algorithms.

TIFF Group 4 Compression Rate (CR G4 )
This work also assessed the size of the monochromatic image files compressed using the Tag Image File Format Group 4 (TIFF_G4) with Run-length encoding (RLE), a new quality measure for monochromatic images recently introduced in [16]. Such a compression scheme is part of the Facsimile (FAX) recommendation and was implemented in most FAX systems at a time when transmitting resources were scarce. The TIFF_G4 file format is possibly the most efficient lossless compression scheme for binary images [5]. One central part of such an algorithm is to apply run-length encoding [15]. Thus, the less salt-andpepper noise present in the binary image, the longer the sequences of the same color bits, yielding a smaller TIFF_G4 file, which claims for less bandwidth for network transmission and less storage space for archiving. The compression rate is denoted by CR G4 and is calculated by: where S G4 denotes the size of the compressed TIFF G4 file and S PNG is the size of the Portable Network Graphics (PNG) compressed file with compression level 4. It is important to remark that such a measure should be used not as an isolated quality measure, but only to re-rank the algorithms with the best PL, as it provides a secondary fine-grained quality measure.

Processing Time Evaluation
The viability of using a binarization algorithm in a document processing pipeline depends not only on the quality of the final image, but also on the processing time elapsed by the algorithm and the maximum amount of memory claimed during the process. To the best knowledge of the authors, the first assessment of binarization algorithms to take the average processing time into account was [9]. The assessed algorithms were implemented by their authors using several programming languages and operating systems, running in different platforms, thus the processing time figures presented here provide the order of magnitude of the time elapsed for binarizing the whole dataset. The training times for the AI-based algorithms were not computed. Two processing devices were used: The algorithms were implemented using two operating systems and different programming languages for specific hardware platforms such as GPUs: The algorithms were executed on different operating systems (OS), but on the same hardware. For those that could be executed on both OS types, the processing times for each OS was measured and no significant difference was noticed. This is expected based on previous experimentation [11]. The mean processing time was used in the analysis. As already mentioned, the primary purpose is to provide the order of magnitude time of the processing time elapsed.

Quality, Space and Time Evaluation
For each of the six devices studied, this paper assesses the performance of the 340 binarization schemes listed applied to photographed documents, with the strobe flash on and off, in two different ways:

1.
Best quality-time and compression: applies the ranking by summation, followed by sorting by processing time, but clustering by device and observing the compression rate for the top-rated algorithms.

2.
Image-specific best quality-time: makes use of PL and [L dist ]. The ranking is performed by first sorting according to the quality measure and when the quality results are the same, sorted by processing time. This is illustrated in Figure 3.
The ranking summation applied to binarization was first applied on the series of competitions Document Image Binarization Competition (DIBCO) [72] and has been then used in many subsequent competitions and assessments [9]. In Figure 4 a visual description of this criterion is presented. First, the algorithms are ranked in the context of each image individually, then the ranking position is summed up across the images, composing the score for each algorithm. The final ranking is determined by sorting the algorithms by the score, and the global mean of all images is presented to provide a quantitative overall ordering.
Sorting directly by the mean of the quality measure gives less precise results, as one seeks here the algorithm that most frequently appears at the top of the ranking, which not necessarily means that it is the best quality all the time. In the example of Figure 4, if one would sort by the [L dist ] mean alone, the Li-Tam algorithm would be the top-ranked, as for Image 2 its [L dist ] is higher than most of the other algorithms, raising its mean value. However, it only appears as the top algorithm for that single image. For most images, Moments is better ranked, indicating that for any given image in such a data set, Moments may provide better results.
The simple mean sorting method is applicable to the first way of assessing the algorithms, as the aggregated images have very similar features (capturing device and print type). As for the second way, the different printing types are aggregated to give an overall result for each device, increasing the variability and making the ranking summation more appropriate.    Figure 4. Example of sorting by the ranking summation criterion. The algorithm marked in red (Moments-R) is the overall best according to this criterion.

Choosing the Best Channel
The recent paper [17] showed that there may be a quality difference in feeding a binarization algorithm with the original color image, its grayscale equivalent (using the luminance formula), or the red, green or blue channel. That fact is important, as having one of the input channels as the best-quality result would save processing space and, consequently, processing time, while the grayscale image demands extra processing time, which may be significant for the faster algorithms. Ideally, one would analyze the best channel for each different type of image; however, for the sake of simplicity, in this study, only the input channel which provided the best PL summation ranking was chosen for each algorithm. In several cases, there was a nearly equal quality result between the red or blue channels and the color image. In some other cases, providing a single channel actually increased the final quality and the channel that more often provided better quality was the red channel. Thus, whenever an algorithm yields similar quality results having the full color image and one of the channels as input, the red channel is chosen, as that often means less processing time and space.
Six of the best-ranked algorithms are presented in Table 3 with their respective average PL and the score of the ranking summation, stressing that the lower the score, the better the algorithm. The algorithm by Singh was one of the few that the blue channel offered better results. Among the best algorithms, Sauvola was the one with the greatest difference between applying a single channel or the original color image.

Results
For each device model, with the in-built strobe-flash on and off, the binarization algorithms were evaluated in two contexts: clustering by the specific image characteristics; and aggregating the whole dataset (global evaluation). In all results, the letter after the original algorithm indicates the version of the image used: R-red; G-green; B-blue; L-luminance; C-original color image. The mean processing time was taken to evaluate the order of magnitude of the time complexity of the algorithms, thus minor time differences are not relevant to this study. The grayscale conversion time was not considered here. Table 4 presents the results for each device using the ranking summation strategy. YinYang22 and Michalak21a are often among the top 5 for any of the tested devices. For Samsung Note 10+, only HuangUNet presented significant improvement using a single channel other than red. For Samsung S21 Ultra 5G, ElisaTV presented good results compared to recent efficient ones such as YinYang22. For Motorola G9, Michalak21a would be recommended either with flash on or off, due to high quality and low processing time. For Samsung A10S, Michalak21a would also be the one recommended. For Samsung S20, even the most classical algorithm (Ostu) could properly binarize photos taken with flash on. It is important to notice that Dataset 2 has less complex images than Dataset 1. For Apple iPhone SE 2 and flash on, which also used Dataset 2, Otsu again appeared as recommended.
The detailed results for each device are presented in Tables 5-8. The quality-time criteria was used (Figure 3), as the variation in image characteristics is lower, and thus the standard variation is small enough to allow a fair assessment. It is important to remark that the standard deviation (SD) of the [L dist ] for the Laser and Deskjet dataset was, for all the top 5 and nearly all the other algorithms, approximately 0.04, and for book dataset it was of 0.01, being in some cases close to zero. Only for devices Samsung S21 Ultra 5G and Samsung Note 10+ there was a more significant variation, with the standard deviation varying from 0.1 to 0.3. Those results demonstrate that the top five algorithms for all test datasets provide excellent binarization results for OCR in general.
The PL standard variation was higher due to a higher variation of the P err measure, which is part of it. For all devices, the SD of the Deskjet and Laser dataset was approximately 4.00, while for book dataset, it was under 1 for the devices Motorola G9, Samsung S20, Samsung A10S and between 1 and 3 for devices Samsung Note 10+, Apple iPhone SE 2, Samsung S21 Ultra 5G. The overall quality perceived by visually inspecting the resulting images produced by the top-ranked algorithms is good.
In order to choose the most suitable algorithm for some specific application, the first thing to consider is the intrinsic characteristics of the printing, as different types of ink and printing methods imply entirely different recommendations, as shown in the tables of results. If the document was printed with a deskjet device, it is recommended to check whether the strobe flash should be on or off prior to the image acquisition. After that, the binarization algorithm with the best quality-time balance must be applied. If an application has no significant time constraint, but the quality is so crucial that even a small amount of lost information is not acceptable, one should choose the top quality-time. However, if the image binarization is part of an embedded application, its processing time is a crucial factor, thus the best quality-time trade-off must be chosen.
Two quality measures were used to support the decision of two types of applications: OCR transcription and printing, archiving or transmission through computer networks. For the first application (OCR transcription), the [L dist ] measure should be used, as it does not take into account the visual quality, but only the OCR precision, giving the algorithms with the best chance to provide the best transcription possible. For the second application, the visual quality is also important, thus the PL measure is used, which allows the choice of the best algorithm for OCR transcription and, at the same time, for printing or transmitting.
In general, keeping the strobe flash on or off does not imply any significant difference in the quality of the best-ranked algorithms; however, in most cases, the set of recommended algorithms varies across the devices. For instance, using Samsung S21 Ultra 5G, the algorithms recommended for deskjet printed documents are similar if one keeps the flash on or off, but they are completely different for book offset-printed documents. The same happens for most other devices, either using the [L dist ] or the PL measure when comparing different setups. This fact highlights the importance of considering as many more algorithms as possible, as in some cases, one algorithm that offers excellent results with one configuration may have totally different results with a different set of capturing conditions, devices and setup.
In the results table for [L dist ] measure, the first red line represents the performance of applying the original color image directly on Google Vision OCR without prior binarization. In most cases, the results are equivalent to the performance of providing a binary image. However, for the Motorola G9 and Apple iPhone SE 2, no OCR output is given for most of the captured images. The standard deviation in all cases was nearly zero, which means there were almost no results for the images. This shows that general-purpose OCR engines can be greatly improved when provided with a clean binary image.
In several cases, the recommended algorithms for OCR ([L dist ]) match the recommendations using the PL measure with the same input channel or a different one. For instance, using Wolf-R to binarize laser documents with flash off captured by the Samsung S21 Ultra 5G yields not only excellent OCR results, but also good visual quality images. If one checks the example binary image using that algorithm at Figure 5b, it is possible to observe how well this algorithm went, generating a clear binary image with nearly no noise.
It is remarkable how classical global algorithms such as Otsu, dSLR and WAN were quality-time top-ranked, but only when using the in-built strobe flash on. This happened because the flash was sufficient to diminish the shadows and allow those global algorithms to work well and highlights that very simple and fast algorithms can still be used for uniform images, even if photographed in different places and by different smartphones. Figures 5 and 6 present some example images. For each input color image, one of the most recommended algorithms is used, according to the global ranking of Table 4. The cropped portion of the image shows the critical regions where shadows and the flash light reflex can be noticed. For nearly all images, an almost perfect binary image was generated. Only in Figure 5c it is possible to see some noise due to the strong flash light reflected on the printed laser page. The laser printing process creates a surface that reflects more light than other types of printing, thus even on the color image, some pixels inside the text stroke are very close to the background ones, making it almost impossible to generate a perfect binary image. No algorithm tested here did better than that, which highlights a possible problem to be solved by future proposals.

Conclusions
Document binarization is a key step in many document processing pipelines, demanding for quality and time performance. This paper analyses the performance of 68 binarization algorithms in images acquired using six different models of smartphones from three different manufacturers, widely used today. The quality, size and processing time of the binarization algorithms are assessed. A novel quality measure is proposed that combines the Levenshtein distance with the overall visual quality of the binary image. The mean compression rate of the TIFF G4 file with RLE compression was also analyzed; it also provides a quality analysis as the quantity of salt-and-pepper noise in the final image degrades file compression performance.
The results were presented through two perspectives: a detailed evaluation considering the device, the in-built strobe flash state (on or off), and the printing technology (deskjet, laser, or offset); a device-based evaluation considering the visual quality and compressed binary image file size.
Several conclusions may be drawn from the presented results: 1.
Keeping the strobe flash on or off may not imply in a better quality image, but one needs to make the right choice of the binarization algorithm in order to have the best monochromatic image.

2.
The ranking order is nearly completely different through all the different possible setups, thus it reinforces the claim that no binarization algorithm is good for all document images. 3.
The quality of the images yielded by the top-rated algorithms with the offset-printed documents (book) dataset is almost perfect if considering the OCR transcriptions precision. 4.
In several cases, as for Apple iPhone SE 2, some global algorithms had the best performance. They are much faster than the newer algorithms and, in some rare cases, even generate cleaner images (better PL).

5.
Even when not in the top rank, newer algorithms such as Michalak or YinYang algorithms and their variants are dominant in the results. It is important to stress that they were developed having as target photographed documents, while most of the other algorithms, overall the global ones, were developed aiming at the scanned document images. 6.
If the compression rate is a priority, YinYang22, with any of the input versions of the image, would be the most recommended algorithm overall, as it offers the best compression rates while maintaining high quality. 7.
If processing time is a priority, Michalak21a with the red channel would be the most recommended algorithm overall, as it requires a small processing time, comparable to one of the classical algorithms, while providing high-quality binary images.

8.
This paper also shows that the PL measure provides a better overall quality evaluation of binarization algorithms. 9.
Analyzing the TIFF G4 compression rate with RLE has also proved valuable, as, on several occasions, two algorithms provided similar quality results, but one may be two times more efficient in this compression scheme. 10. None of the tested algorithms could perfectly binarize the regions of the laser-printed documents in which the strobe flash (whenever on) created a strong noise in the central region of the image, which suggests that such a set-up should be avoided when photographing laser printed documents.
The recent paper [18] changes the outlook from the document to the device, in such a way that if one had to in-built one binarization algorithm in an embedded application handling document images, which would that be? That algorithm would have to be light and fast enough to yield good quality-space-time performance. Following that approach and looking at Table 4, one could recommend the following algorithms for each device: No doubt the list above may suffer variations as visual inspection carries some degree of subjectivity amongst time performances of around the same order of magnitude.
The authors of this paper recently became aware of the reference [73], in which the authors look at the impact on binarization of the color-to-gray conversion algorithms. Besides the binarization performance of the color-to-gray CIE Y (International Commission on Illumination luminance channel) conversion algorithm (assessed here), reference [73] looks at five other algorithms. It proposes two new schemes focusing on the quality of the final monochromatic image and makes a global assessment of scanned documents. The analysis of the performance of such color-to-gray conversion algorithms on photographed documents is left for further work.
Another important point also left as line for further work is setting in-built the strobe flash in auto mode, which means that the device itself will decide, depending overall on the quantity of light in the environment, if the flash will be activated or not. Funding: The research reported in this paper was partly sponsored by The MEC Essay Project-Automatically Assessing Handwritten Essays in Portuguese from the Ministry of Education of Brazil and the RD&I project Callidus Academy signed between the Universidade do Estado do Amazonas (UEA) and Callidus Indústria through the Lei de Informática/SUFRAMA. Rafael Dueire Lins was also partly sponsored by CNPq-Brazil.

Data Availability Statement:
The results presented here made use of the IAPR (International Association on Pattern Recognition) DIB-Document Image Binarization dataset, available at: https: //dib.cin.ufpe.br, accessed on 17 January 2023.